First let's load some libraries, and some data.
import imageio # for image i/o
import numpy as np # general numeric operations and matrices
import io # for io.BytesIO, turns the data stream into a file-like object
import matplotlib.pyplot as plt # allows us to plot things
import urllib.request as request # downloads a file from the web
import math # for things like pi
%matplotlib inline
image_url = 'http://isle.illinois.edu/speech_web_lg/coursematerials/ece417/18fall/cat400.png'
cat400=imageio.imread(io.BytesIO(request.urlopen(image_url).read())).astype(float)
plt.figure(1,figsize=(int(424*0.05),int(240*0.05)))
plt.imshow(cat400.astype('uint8'))
plt.title('Image of a cat with resolution {}'.format(cat400.shape),fontsize=18)
Text(0.5, 1.0, 'Image of a cat with resolution (240, 424, 3)')
Notice that, in this case, the size of the image is $(240,480,3)$, meaning that it has 240 rows, 480 columns, and 3 colors. imageio creates a numpy array indexed in that order (rows, columns, colors); other packages index in other orders, so you have to be careful each time you use a new package. In any case, we can index this image as
where we'll use
Downsampling just means we remove $D-1$ out of every $D$ samples, thus $$y[m,n,k]=x[mD,nD,k]$$
(M,N,K)=cat400.shape
cat100=np.zeros((M,N,K),dtype=cat400.dtype)
for m in range(0,int(M/4)):
for n in range(0,int(N/4)):
for k in range(0,K):
cat100[m,n,k]=cat400[4*m,4*n,k]
plt.figure(1,figsize=(int(424*0.05),int(240*0.05)))
plt.imshow(cat100.astype('uint8'))
plt.title('Cat Decimated to 60x106x3',fontsize=18)
Text(0.5, 1.0, 'Cat Decimated to 60x106x3')
plt.figure()
t4=np.linspace(0,N-1,N,dtype='int16')
f1=plt.figure(figsize=(14,4))
plt.plot(t4,cat400[160,:,2],t4,cat100[40,:,2])
plt.title('Row 160 of fullsize image, row 40 of decimated image, Blue plane')
Text(0.5, 1.0, 'Row 160 of fullsize image, row 40 of decimated image, Blue plane')
<Figure size 432x288 with 0 Axes>
Remember that the multi-dimensional Fourier transform is $$X(\omega_1,\omega_2,\omega_3)=\sum_{n_1}\sum_{n_2}\sum_{n_3}x[n_1,n_2,n_3]e^{-j(\omega_1n_1+\omega_2n_2+\omega_3n_3)}$$
Since this is a separable operation, we can learn a lot by just focusing on one dimension at a time. For example, let's define the DTFT of a row to be:
$$X(m,\omega,k)=\sum_n x[m,n,k]e^{-j\omega n}$$$$Y(m,\omega,k)=\sum_n y[m,n,k]e^{-j\omega n}$$But what is the relationship between $Y(\omega)$ and $X(\omega)$? Hint: it involves aliasing.
This is easiest to analyze in two steps. First, multiply $x[n]$ by $$p[n]=\sum_{m=-\infty}^\infty \delta[n-mD]$$ in order to get the signal $$v[n]=p[n]x[n]$$
pulsetrain = np.zeros((M,N,K))
pulsetrain[::4,::4,:]=1
cat400_times_pulsetrain = cat400 * pulsetrain
plt.figure(1,figsize=(int(424*0.05),int(240*0.05)))
plt.imshow(cat400_times_pulsetrain.astype('uint8'))
plt.title('Cat Multiplied by Pulse Train',fontsize=18)
Text(0.5, 1.0, 'Cat Multiplied by Pulse Train')
The Fourier transform of a pulse train is a pulse train:
$$p[n]=\sum_{m=-\infty}^\infty \delta[n-mD]$$which has a transform of $$P(\omega)=\frac{2\pi}{D}\sum_{k=0}^{D-1} \delta\left(\omega-\frac{2\pi k}{D}\right)$$
pulsetrain_transformed = np.fft.fft(pulsetrain[0,:,0])
n = np.arange(N)
omega = np.linspace(0,2*np.pi,N,endpoint=False)
fig, ax = plt.subplots(2,1, figsize=(14,7))
ax[0].plot(n,pulsetrain[0,:,0])
ax[0].set_title('Impulse train in the time domain, with one impulse every 4 samples')
ax[0].set_xlabel('$n$')
ax[1].plot(omega,np.abs(pulsetrain_transformed))
ax[1].set_title('Its Fourier transform is a pulse train, with one impulse per $2\pi/4$ radians')
ax[1].set_xlabel('$\omega$')
fig.tight_layout()
Remember that $v[n]=p[n]x[n]$. Remember, also, that multiplying two signals in the time domain is the same as convolving them in the frequency domain:
$$v[n]=x[n]p[n] \Leftrightarrow V(\omega)=\frac{1}{2\pi} P(\omega)\ast X(\omega)$$Since convolving any signal with an impulse is the same as shifting the signal, we get that
$$P(\omega)=\frac{2\pi}{D}\sum_{k=0}^{D-1} \delta\left(\omega-\frac{2\pi k}{D}\right)$$implies that
$$V(\omega) = \frac{1}{D} \sum_{k=0}^{D-1}X\left(\omega-\frac{2\pi k}{D}\right)$$cat400_transformed = np.fft.fft(cat400[160,:,2])
cat400_times_pulsetrain_transformed = np.fft.fft(cat400_times_pulsetrain[160,:,2])
fig, ax = plt.subplots(2,1, figsize=(14,7))
ax[0].plot(omega,20*np.log10(np.abs(cat400_transformed)))
ax[0].set_title('Fourier transform of the original high-res cat image')
ax[0].set_ylabel('deciBels')
ax[1].plot(omega,20*np.log10(np.abs(cat400_times_pulsetrain_transformed)))
ax[1].set_title('Fourier transform of the cat times the pulse train -- notice the aliasing!!!')
ax[1].set_xlabel('$\omega$')
ax[1].set_ylabel('deciBels')
fig.tight_layout()
Notice that $v[n]$ has all of the same samples as the downsampled image $y[n]$ -- they're just spread out, with 3 zeros between each nonzero sample. Getting rid of the zeros doesn't change the spectrum, so we can compute it as follows:
$$y[n] = v[nD]$$$$Y(\omega)=\sum_n y[n]e^{-j\omega n}$$$$=\sum_m v[m=nD]e^{-j\omega m/D}$$$$=V\left(\frac{\omega}{D}\right)$$In other words, $Y(\omega)=V(\omega/D)$, which basically means that we just relabel the frequency axis -- anything that happens at $\omega=2\pi/D$ on the frequency axis for $V(\omega)$ happens at $\omega=2\pi$ on the frequency axis for $Y(\omega)$. In particular, both $V(\omega)$ and $Y(\omega)$ have the same aliasing:
$$Y(\omega)= \left(\frac{1}{D}\right)\sum_{d=0}^{D-1} X\left(\frac{\omega-2\pi d}{D}\right)$$cat100_transformed = np.fft.fft(cat100[40,0:106,2])
fig = plt.figure(1,figsize=(14,10))
plt.subplot(311)
plt.plot(omega,20*np.log10(abs(cat400_transformed)))
plt.title('DTFT of row 160: Original')
plt.subplot(312)
plt.plot(omega,20*np.log10(abs(cat400_times_pulsetrain_transformed)))
plt.title('DTFT of row 160: Aliased in Frequency')
plt.subplot(313)
plt.plot(omega[::4],20*np.log10(abs(cat100_transformed)))
plt.title('DTFT of row 40, which used to be row 160: Downsampled')
fig.tight_layout()
We can avoid aliasing by lowpass filtering the signal prior to downsampling.
An ideal lowpass filter with cutoff frequency $\omega_c$ is given by $$h[n]=\frac{\omega_c}{\pi}\mbox{sinc}(\omega_c n)=\frac{\sin(\omega_c n)}{\pi n}$$ When we create a lowpass filter by hand, we have to be careful with the $n=0$ sample: $$h[0]=\frac{\omega_c}{\pi}$$ In order to avoid aliasing, we need $$\omega_c=\frac{\pi}{D}$$ We can approximate an ideal lowpass filter by creating a reasonably long, rectangular-windowed FIR filter (Hamming window would be better). Then we can lowpass filter by convolving each row and each column: $$x_{lpf}[n]=h[n]\ast x[n]$$ $$y_{lpf}[n] = x_{lpf}[nD]$$
n16 = np.arange(-7.5,8.5)
omega16 = np.linspace(0,2*np.pi,16,endpoint=False)
lpf = 0.25 * np.hamming(16) * np.sin(np.pi*n16/4) / (np.pi*n16/4)
lpf16_transformed = np.fft.fft(lpf16)
fig=plt.figure(1,figsize=(14,6))
plt.subplot(211)
plt.plot(n16,lpf)
plt.title('Hamming-Windowed Lowpass Filter at $\omega_c=\pi/4$, length = 1000 samples')
plt.xlabel('$m$')
plt.subplot(212)
plt.plot(omega16,np.abs(lpf16_transformed))
plt.title('Hamming-Windowed Lowpass Filter at $\omega_c=\pi/4$, length = 1000 samples')
plt.xlabel('$\omega$')
fig.tight_layout()
The np.convolve2d function implements convolution in two dimensions, but it's very slow to convolve in 2D (complexity is O($M^2N^2$)).
A much faster method is to first convolve each column (complexity: O($N^2$)), then convolve each row (complexity: O($M^2$)), total complexity is only O($M^2+N^2$). This is called "separable filtering".
Here, we use mode='nearest' in order to repeat the outermost edge outward as padding for the convolution.
tmp = np.zeros((M,N,K))
cat400_lowpassfiltered = np.zeros((M,N,K))
for k in range(0,K):
for m in range(0,M):
tmp[m,:,k] = np.convolve(lpf, cat400[m,:,k], mode='same')
for n in range(0,N):
cat400_lowpassfiltered[:,n,k] = np.convolve(lpf, tmp[:,n,k], mode='same')
plt.figure(1,figsize=(int(424*0.05),int(240*0.05)))
plt.imshow(cat400_lowpassfiltered.astype('uint8'))
plt.title('Cat that has been convolved with an LPF',fontsize=18)
Text(0.5, 1.0, 'Cat that has been convolved with an LPF')
Now let's dowsample the filtered image. This should give a spectrum with much less downsample-aliasing.
catf100 = cat400_lowpassfiltered[::4,::4,:]
plt.figure(1,figsize=(int(424*0.05),int(240*0.05)))
plt.imshow(catf100.astype('uint8'))
plt.title('Decimated cat: lowpass filtered, then downsampled',fontsize=18)
Text(0.5, 1.0, 'Decimated cat: lowpass filtered, then downsampled')
Upsampling is the process of creating a larger image, from a smaller image, by just inserting zeros: $$z[m,n,k] = \left\{\begin{array}{ll} y[m/D,n/D,k] & m/D,~n/D~\mbox{are integers}\\ 0 & \mbox{otherwise} \end{array}\right.$$
Again, the problem is aliasing: $$Z(\omega) = Y(D\omega)$$
This time, though, the aliasing is much more visible in the image. In fact, the image is mostly black dots, with a few spots of color (one per $D\times D$ square).
cat_upsampled_double = np.zeros(cat400.shape,dtype=cat400.dtype)
for m in range(0,60):
for n in range(0,106):
for k in range(0,K):
cat_upsampled_double[4*m,4*n,k]=cat100[m,n,k]
cat_upsampled = np.ndarray.astype(np.maximum(0,np.minimum(255,cat_upsampled_double)),dtype='uint8')
plt.figure(1,figsize=(int(424*0.05),int(240*0.05)))
plt.imshow(cat_upsampled)
plt.title('Cat that has been upsampled without interpolation')
Text(0.5, 1.0, 'Cat that has been upsampled without interpolation')
The solution is obvious: rather than just filling zeros between the upsampled samples, we need to fill in some meaningful value. The first solution to consider is piece-wise constant interpolation, sometimes called zero-order hold (ZOH).
$$z[m,n,k]=y[\mbox{int}(m/D),\mbox{int}(n/D),k]$$This results in some aliasing, but not as bad as the upsampled cat.
cat_pwc_double = np.zeros(cat400.shape,dtype=cat400.dtype)
for m in range(0,240):
for n in range(0,424):
for k in range(0,K):
cat_pwc_double[m,n,k]=cat100[int(m/4),int(n/4),k]
cat_pwc = np.ndarray.astype(np.maximum(0,np.minimum(255,cat_pwc_double)),dtype='uint8')
plt.figure(1,figsize=(int(424*0.05),int(240*0.05)))
plt.imshow(cat_pwc)
plt.title('Cat interpolated using PWC interpolation')
Text(0.5, 1.0, 'Cat interpolated using PWC interpolation')
If piece-wise constant (PWC) interpolation is much better than upsampling, then piece-wise linear (PWL) interpolation should be much better.
PWL interpolation in two dimensions is called bilinear'' interpolation. In this case the wordbilinear'' just means linear
in two different directions.
A sample $z[m,n]$ in the output is closest to four different samples of the input: $$y\left[\mbox{int}\left(\frac{m}{D}\right),\mbox{int}\left(\frac{n}{D}\right)\right],$$ $$y\left[\mbox{int}\left(\frac{m}{D}\right)+1,\mbox{int}\left(\frac{n}{D}\right)\right],$$ $$y\left[\mbox{int}\left(\frac{m}{D}\right),\mbox{int}\left(\frac{n}{D}\right)+1\right],$$ and $$y\left[\mbox{int}\left(\frac{m}{D}\right)+1,\mbox{int}\left(\frac{n}{D}\right)+1\right].$$ The first thing to do is to calculate the fractional contribution of each of these four samples to the output: $$e = \frac{m}{D} - \mbox{int}\left(\frac{m}{D}\right)$$ $$f = \frac{n}{D} - \mbox{int}\left(\frac{n}{D}\right)$$
Then we can create the output sample as
$$z[m,n,k] = \left(1-e\right)\left(1-f\right)\times y\left[\mbox{int}\left(\frac{m}{D}\right),\mbox{int}\left(\frac{n}{D}\right),k\right]$$$$+ e\left(1-f\right)\times y\left[\mbox{int}\left(\frac{m}{D}\right)+1,\mbox{int}\left(\frac{n}{D}\right),k\right]$$$$+ \left(1-e\right)f \times y\left[\mbox{int}\left(\frac{m}{D}\right),\mbox{int}\left(\frac{n}{D}\right)+1,k\right]$$$$+ ef\times y\left[\mbox{int}\left(\frac{m}{D}\right)+1,\mbox{int}\left(\frac{n}{D}\right)+1,k\right]$$cat_pwl_double = np.zeros(cat400.shape,dtype='double')
for m in range(0,240):
for n in range(0,424):
for k in range(0,K):
m1 = int(m/4)
n1 = int(n/4)
m2 = min(int(M/4-1),m1+1)
n2 = min(int(N/4-1),n1+1)
e = 0.25*m - m1
f = 0.25*n - n1
cat_pwl_double[m,n,k]=(1-e)*(1-f)*cat100[m1,n1,k]+e*(1-f)*cat100[m2,n1,k]+(1-e)*f*cat100[m1,n2,k]+e*f*cat100[m2,n2,k]
cat_pwl = np.ndarray.astype(np.maximum(0,np.minimum(255,cat_pwl_double)),dtype='uint8')
plt.figure(1,figsize=(int(424*0.05),int(240*0.05)))
plt.imshow(cat_pwl)
plt.title('Cat upsampled using piece-wise linear interpolation')
Text(0.5, 1.0, 'Cat upsampled using piece-wise linear interpolation')
Notice that piece-wise-linear interpolation is just like upsampling the cat, and then convolving with the piece-wise linear interpolation filter:
$$h_{PWL}[n] = \left\{\begin{array}{ll} \frac{D-|n|}{D} & -D\le n\le D\\ 0 & \mbox{otherwise}\end{array}\right. $$Similarly, piece-wise constant interpolation is just like upsampling the cat, and then convolving with the piece-wise constant interpolation filter:
$$h_{PWC}[n] = \left\{\begin{array}{ll} 1 & 0\le n<D\\ 0 & \mbox{otherwise}\end{array}\right.$$zero_sample = 6
n_axis = np.linspace(-zero_sample,zero_sample, 2*zero_sample+1, dtype='int16')
h_pwc = np.zeros(13)
h_pwl = np.zeros(13)
for n in range(0,4):
h_pwc[zero_sample+n]=1
h_pwl[zero_sample+n]=0.25*(4-n)
h_pwl[zero_sample-n]=0.25*(4-n)
plt.figure(1,figsize=(14,6))
plt.subplot(211)
plt.stem(n_axis,h_pwc)
plt.title('PWC Interpolation is like convolving with the PWC interpolator')
plt.subplot(212)
plt.stem(n_axis,h_pwl)
plt.title('PWL Interpolation is like convolving with the PWL interpolator')
Text(0.5, 1.0, 'PWL Interpolation is like convolving with the PWL interpolator')
PWC interpolation suffers from obvious blocky artifacts. PWL interpolation smooths away most of those, but not all. We can get rid of all of them, and get the lowpass-filtered cat back again exactly, by filtering the upsampled cat using an ideal sinc function.
$$z[n]=D^2 h_{LPF}[n]\ast y[n]$$Multiplying by a factor of $D^2$ is necessary because we're trying to construct $D^2$ output samples from every one input sample.
cat_si_double=np.zeros((240,424,3),dtype='double')
cat_half_filtered_double=np.zeros((240,424,3),dtype='double')
for k in range(0,K):
for m in range(0,M):
cat_half_filtered_double[m,:,k]=4*filters.convolve1d(cat_upsampled_double[m,:,k],lpf,mode='wrap')
for n in range(0,423):
cat_si_double[:,n,k]=4*filters.convolve1d(cat_half_filtered_double[:,n,k],lpf,mode='wrap')
cat_si = np.ndarray.astype(np.maximum(0,np.minimum(255,cat_si_double)),dtype='uint8')
plt.figure(1,figsize=(int(424*0.05),int(240*0.05)))
plt.imshow(cat_si)
plt.title('Cat upsampled using sinc interpolation',fontsize=18)
Text(0.5, 1.0, 'Cat upsampled using sinc interpolation')
Well, TBH I like the PWL interpolation best. Sinc-squared would be a good option, if we could get rid of the grid artifact.